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ABSTRACT 

Spam messages are an increasing threat to mobile com- 
munication. Several mitigation techniques have been pro- 
posed, including white and black listing, challenge-response 
and content-based filtering. However, none are perfect and 
it makes sense to use a combination rather than just one. 
We propose an anti-spam framework based on the hybrid of 
content-based filtering and challenge-response. There is the 
trade-offs between accuracy of anti-spam classifiers and the 
communication overhead. Experimental results show how, 
depending on the proportion of spam messages, different fil- 
tering parameters should be set. 
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1. INTRODUCTION 

Short Message Service (SMS) and Multimedia Messaging 
Service (MMS) are a popular means of mobile communica- 
tion. Texting costs have decreased continuously over the 
years (to an extent of free texting) whereas the bandwidth 
for communication has increased dramatically. Such trends 
have attracted a large number phishing and spamming at- 
tacks using text messages. In particular, spam messages con- 
taining pornographic or promotive materials are an emerg- 
ing phenomenon, and they have caused a significant level of 
inconvenience for users. These are now prevalent in Korea, 
Japan and China and prone to spread across countries where 
mobile communication is popular. Statistics for 2008 show 
that a user in China, on average, receives 8.29 SMS spam 
per week pp. 

Much of the existing research into anti-spam solutions, 
however, has focused on the protection of emails in the con- 
text of the Internet. Some of the popular methods include 
white and black listing, digital signature, postage control, 
address management, collaborative and content-based filter- 
ing [2] |3l |U [6] . Different characteristics between emails 
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and text messages make it harder for one to apply such ap- 
proaches directly in mobile networks and analyze the re- 
sults. Each approach has its own set of drawbacks and does 
not improve much if used alone. For example, the extra 
traffic required to perform challenge-response needs to be 
minimized (or needs to be compensated for) as it is more 
expensive to use the bandwidth in mobile networks. This 
issue is capable being addressed by content-based filtering: 
obvious spam would be filtered first to reduce the number 
of messages subject to challenge-response. In this paper, we 
propose an anti-spam framework based on the combination 
of these two methods. We attempt to reduce a great num- 
ber of high- volume spamming, and as a result, minimize the 
extra amount of bandwidth that would be required. Given 
a reasonable filtering algorithm, we show that, ultimately, 
less bandwidth (than freely allowing high- volume spam) will 
be used with our method. 

The remainder of the paper is organized as follows, fn 
Section O we describe a hybrid spam filtering framework. 
Section [4] evaluates the performance of our hybrid method 
based on two measures, the traffic usage and the accuracy. 
Finally, in Sections [5] we discuss the contribution of this 
paper and the remaining work. 

2. EXISTING SOLUTIONS 

Content-based filtering solutions have been proved to be 
effective against emails, which are typically larger in size 
compared to text messages. Abbreviations and acronyms 
are used more frequently in SMS and they increase the level 
of ambiguity. [7] propose binary classification and filtering 
methods for short messages in a Bayesian scheme. Classifi- 
cation rules are defined and extended from general patterns 
identified in past spam. However, adaptive schemes as such 
are weak against innovative attacks where strategies con- 
stantly evolve to manipulate classification rules. Filtering 
alone will not be sufficient to detect spam. 

Many anti-spam solutions [HE] have been suggested based 
on a challenge-response protocol. A message sender needs 
to verify that they are a legitimate sender by answering the 
challenge message (e.g. through a web interface) before their 
message is forwarded to the recipient. The sender authen- 
ticates themselves as a human-user by answering a simple 
turing test for which a machine cannot easily understand. 
Nevertheless, the protocol has often been criticized for ex- 
tra user interaction and traffic used. There might also be a 
significant overhead in storing and managing challenge mes- 
sages. 

Our goal is to develop a solution that ultimately mini- 



mizes the usage of network bandwidth by discouraging high- 
volume spamming. We believe the extra traffic required to 
perform challenge-response can be compensated if a large 
amount of spamming attacks can be reduced as a result. 
In our approach the challenge-response protocol classifies 
machine-generated spam. We also use the filtering method 
to reduce the number of messages that need to be verified. 
Simulation results in later sections show that this hybrid ap- 
proach is capable of controlling high- volume spam and the 
traffic usage. 

3. A HYBRID FRAMEWORK 

Text messages are classified into three different regions 
using a content-based filtering method: normal, uncertain 
and spam. A filtering method cannot deal with uncertain 
messages; therefore, we use a challenge-response protocol 
to classify the uncertain messages into normal and spam 
regions (see Fig. 1). We assume that the majority of spam 
messages are generated by machines. 

A human verification mechanism (in the form of challenge- 
response) is added to a common filtering scheme to detect 
whether an uncertain message falls into the normal or the 
spam region. A message center (owned by an operator) 
sends a challenge query to check if the sender is an indi- 
vidual or a machine. The sender responds by answering the 
query and the center compares the returned value against 
the known correct value. If the values match, the message 
is classified as normal, else, as spam. We are interested in 
further classification of this uncertain region. A message 




Figure 1: Hybrid Spam Filtering Overview 

center is given the full responsibility of running the frame- 
work due to the following reasons. Firstly, it should reduce 
the traffic usage by filtering spam as early as possible, be- 
fore forwarding them to the recipient. Secondly, using the 
challenge-response protocol, the center will be able to col- 
lect an enormous amount of sample data in real time; these 
can be used to develop highly effective classifiers and con- 
tinuously improve the performance of filtering algorithms. 
Lastly, it would be difficult to install and maintain a homo- 
geneous anti-spam software on all mobile devices; instead 
we rely on one solution deployed in a message center. 

3.1 Uncertain Region 

If we assume there are only two regions, normal and spam, 
a filtering method will use binary classification. Suppose 
that we have a probabilistic model for the anti-spam clas- 
sifier as a posterior distribution Pr(c — normally). This is 
the probability that a message is normal: c and y denote 
realization of random variables for a class and a message, 




(a) Two possible cases 



(b) Uncertain regions 



Figure 2: (a) Two possible cases: h > h (case 1) 
and h < h (case 2) for a given ground truth h (red 
dot line) and (b) Modified classification embedding 
uncertain area given a ground truth h (red dot line) 



respectively. The odd ratio of the posterior is used to ob- 
tain a measurable classification by O vos t = p j(' : - normal lw) 

j pusL Pr(c — spam|y) 

If Opoat > 1, a message is classified as normal; otherwise, as 
spam. Alternatively, we can simply use a threshold based 
approach in the posterior distribution. If Pr(c — normal\y) 
is close to one, a message is likely to be normal; if close to 
zero, it is likely to be spam. Let c = f(y,h) be a spam 
filter where c and h are an output and a given threshold, 
respectively. This filter would work with the following rules: 



c = f(y, h) = 



normal if Pr(c = normally) > h 
spam if Pr(c = normally) < h 



(1) 



This separates normal messages from spam (odd ratio ap- 
proach is a special case where h = 0.5). The main problem 
with this approach is finding a proper threshold; because 
the threshold for ground truth h is unknown, there are two 
possible cases as shown in Fig. If h is higher than h, 
some of the normal messages in region A may be classified 
as spam. If h is lower than h, some of the spam in region 
B may bypass the filter and reach the recipients. Such a 
threshold problem will always be present in classification: it 
is almost impossible to find the underlying h; and anti-spam 
software companies are likely to use strategies based on their 
own experiences. The sensitivity problem can be resolved 
by introducing an uncertain region with two thresholds (see 
Fig. [2}b). These can be implemented as the upper and 
lower boundaries of a traditional threshold system. There 
are three labels: spam, uncertain area, and normal; we focus 
on the uncertain area. Spam and normal regions are classi- 
fied as in the traditional system. Only the messages that fall 
into the uncertain area are checked using challenge-response. 
In the next section we describe the protocols in detail. 

3.2 Challenge-Response Protocols 

We assume that there is a turing test available with a low 
probability of producing false positives and negatives. Com- 
pletely Automatic Public Test to tell Computer and Humans 
Apart (CAPTCHA) is a commonly used one: it generates 
pattern matching problems for which a human can easily rec- 
ognize and a machine cannot. An automated program that 
generates thousands of spam messages will not be capable of 
answering a CAPTCHA based challenge, which might be a 
graphical image containing a faint typeface. If the response 
is correct, there is a high probability that the sender is a 
human. A CAPTCHA can be designed in a flexible manner 
using different media forms such as an image, an audio file 
and a text [9]. Their implementation details are beyond the 



scope of this paper. 

A number of challenge-response protocols have already 
been proposed 1,8. However, these focus only on the im- 
plementation issues without considering the security model 
and the cryptographic details. This section defines our se- 
curity models and describes a number of possible protocols 
in line with them. There are several issues we need to con- 
sider before designing the protocols. Firstly, in dealing with 
spam, message authentication and integrity are important; 
whereas, confidentiality is not. Secondly, text messages are 
usually unencrypted and unsigned; it is possible to tamper 
with them during transmission. Thirdly, security properties 
of the communication channel between a message center and 
a sender need to be defined; this channel might or might not 
be an authenticated one. Lastly, management of the ses- 
sion information between all trusted pairs while performing 
challenge-response, would impose a huge storage overhead 
on a message center; there might be more than one mes- 
sage center sharing this information; and it might or might 
not be stored in the center. Mindful of these security and 
scalability issues, we propose four different protocols: pro- 
tocols 3 and 4 have been designed with the assumption of 
an authenticated channel, and protocols 1 and 2 have not; 
moreover, protocols 1 and 3 assume that a message center 
manages the session information, and the others do not. 

3.2.1 Notations 

The symbols S and R represent a sender and a recipi- 
ent, respectively. M represents a mobile message center, 
T a timestamp, TV a nonce, K a key and K~ x its inverse. 
In a symmetric crypto-system such as AES, K and K -1 
are always equal. We use {P}k for a plain text message 
P encrypted with K. H is a one-way hash function. The 
subscript m in K m implies that K m is M's public key. In 
addition, ms in K mB shows that K ms is intended for com- 
munication between M and S. 

A sender's ability to respond to a challenge depends on 
knowing and interpreting a key, K c ~ x . A non-authorized 
sender (e.g. a program sending spam) will not be able to in- 
terpret and gain information about K c ~ l ; this key serves to 
identify machine-generated spam. For simplicity, encryption 
algorithms are not considered in our protocols. 

3.2.2 Protocols 

In protocol 1, the message center, M, maintains the ses- 
sion information. 
[Protocol 1] 

(Ml) 5" — ► M : S,R,P 

(M2) M — >S: M,S,{K m3 } Kc ,{H(S,R,P),N} Kme 
(M3) S — >M: S,M,{H(S,R,P),N + l} Kms 
Before sending message 1, 5* stores R and P to prevent 
message modification attacks. After receiving message 1, M 
generates K ms and stores (S, R, P, K ma , N) as the session in- 
formation. K ms is protected with K c . An image CAPTCHA 
would be one way of protecting K ms against spam pro- 
grams. After receiving message 2, S decrypts {K mB }K c by 
answering the challenge (their ability to interpret Kc -1 ). S 
then decrypts H(S, R, P) and N using K ms . S compares 
H(S, R, P) against the previously stored values. S termi- 
nates the protocol if these values do not match; otherwise, 
S generates {H(S, R, P),N + l}i*r ma by K ms and sends it to 
M. After receiving message 3, M verifies {H(S, R, P), N + 
l}jf ms ■ H ^ i s valid, M forwards the stored message (S, R, P) 



to R. Finally, M deletes the session information. 

Users will be frustrated if challenge-response happens too 
often. We use a timestamp, T, to solve this problem. Af- 
ter receiving message 3, M maintains a session information 
(S, R, P, K ma , T) between S and R for a given time interval. 
M checks the validity of K ma using the session information 
and a policy that defines the lifetime of K ms . 

The main drawback of this protocol is that M has to bear 
the huge overhead of maintaining the session information. 
We describe another protocol which solves this issue by us- 
ing authorized tokens instead: 

[Protocol 2] 

(Ml) S — > M : S, R, P 

(M2) M — > S : M,S,{K ma } Kc ,{H(S,R,P)} KmB , 

{K ms ,H(S, R),T} Km -i 
(M3) S^M: S,R,{P} Kms ,{Kms,H(S,R),T} Km -i 

The key difference is the use of {K ms , H(S, R),T} K -i (which 
can only be generated by M) as the authorization token for 
verifying a response. M checks whether S is authorized by 
looking at {A" ms , H(S, R), T} Km -i . Using this token, 5 can 
just send message 3 alone, including a new text (P'), within 
the lifetime of T: 

(Ml) S^M: S,R,{P'} Kms ,{K mB ,H{S,R),T} Km -i 

In these protocols, however, S cannot find out where the 
challenge comes from. In an attempt to solve this problem, 
we assume there is an authenticated channel between M to 
S, and M's public key K m is securely installed in a mobile 
device owned by S; perhaps during the process of manufac- 
turing. We describe the following two protocols based on 
these assumptions: 

[Protocol 3] 

(Ml) S — > M : S, R, P 

(M2) M^S: M,S,{{K ms }K c ,N} Km -i 

(M3) S — >M: S,R,{N + l} Kms 

In protocol 3, M maintains the session information, 
(S,R,P,K ms ,N). When message 2 arrives, S verifies 
the signature on {{K ms }K a , N} Km -i . S does not 
respond if the signature is unknown. 

[Protocol 4] 

(Ml) S — ► M : S, R, P 

(M2) M^S: M,S,{{K ms } Kc ,H(S,R) y T} Km -i, 

(M3) S — > M : S,R, {P} Kms , 

{{Jf ms } fCo ,H(S,Ji),T} fCm - 1 

Protocol 4 uses {{K ma }K c j H(S, R), T} Km -i as the autho- 
rized token. Our protocols are likely to be compatible with 
existing devices since the majority already have built-in en- 
cryption and hash functions. 

3.3 Observations 

3.3.1 Upgrading Protocols 

A message is always sent to the message center of the con- 
tracted operator first. If the message is directed at someone 
contracted to a different operator, it is forwarded to another 
message center before reaching the receiver's handset [10j . 
This means if one of the message centers decides not to use 



our framework, all uncertain texts delivered via that cen- 
ter would bypass the spam filter. It would be the weakest 
point (and the only route needed) for an attack. Hence, 
all existing message centers would have to support the new 
protocol. While this is a large change and a challenging 
one, operator-sponsored forums like OMTP (Open Mobile 
Terminal Platform), are working with key mobile operators 
to unify and recommend mobile terminal requirements [llj . 
With the increasing number of spam texts, it seems likely 
that the ability to filter machine-generated uncertain texts 
will persuade operators into upgrading their systems. 

3.3.2 Performance 

If there are too many messages subject to challenge-response, 
its overhead will dominate; for example, sending an image 
CAPTCHA is a huge overhead to authenticate a 100 char- 
acter text message. Future work may look at adding a 'by- 
pass' to the hybrid: for example, if a message begins with a 
user-settable password (typically the recipient's name, but 
changeable), then it should be automatically treated as nor- 
mal. As the uncertain region becomes smaller, we expect 
the performance of our framework to improve. 

3.3.3 Usability Issues 

Adapting CAPTCHA methods will have implications on 
the usability. A device might not have the capability to dis- 
play an image CAPTCHA to a readable standard; also a mo- 
bile user might find it difficult to verify an audio CAPTCHA 
due to the background noise. These issues however, are 
likely to be resolved with technological advances. According 
to the Kelsey Group's second annual study on mobile use, 
more users than ever own internet-enabled smartphones: 
phones which provide advanced information accessing func- 
tions. 

4. EVALUATION 

4.1 Description of Datasets 

In order to measure the performance of our framework, 
we have generated synthetic datasets. Suppose that there 
are N number of sent messages (we set N = 5000). We 
will use p and q to show the normal to spam proportion 
where p + q = 1, and p and q are non- negative numbers 
(in reality, there would be many operators with different 
proportions). Let n be a random variable generated from 
an existing filtering method: k = Pr(c — normal\y). For an 
artificial dataset, we build a mixture model given by 

p(«|A) = p(k\c = normal, A)p(c = normal\X) 

+p(«;|c = spam, A)p(c = spam\\) (2) 

where A denotes a set of hyper-parameters which control 
parameters. We assume c; is generated from binomial dis- 
tribution with hyper-parameters p and q. Thus, we have: 

C ~ p(c\X) = Pr(c\l,p) = p c (l - p) l - c = pV~ c 

After classifying the ith sample message, we generate the 
expected probability (this is the filtering output): 

I i \ \ _ f P( K \ C — normal, A) = B(n;ai,(3i) 
K JH K l c i I 1 p(k\c = spam, A) = 23(k; ao, /?o) 

Here, B represents beta distribution and its hyper-parameters 
are set as follows: ceo = 3, /3o = 5, ai = 5, /3i = 2. Both 
thresholds (hi and h%) vary between and 1 by 1/30. 




500 1000 1500 2000 2500 3000 3500 4000 4500 5000 
ndex number, i lor y 



(a) N messages 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

pr(c=normal|y) 



(b) Distribution 

Figure 3: Displaying k = Pr(c — normal\y) for N mes- 
sages: spam (14.57%) and normal (85.32%) 



We have built an artificial dataset based on a Spanish 
database [7] which shows the proportion of spam as 14.57% 
and normal as 85.32%; that is, q = 0.1457 and p = 0.8532. 
The generated data have been plotted in Fig. [3} (a). A red 
cross represents normal message and a blue circle represents 
spam. This colouring scheme is also used in Fig. [3]-(b). 
The graphs show that there are a lot of overlapping labels 
between 0.2 and 0.8. This overlapping section is considered 
as the uncertain region. Note that the challenge-response is 
not perfect and some of the spam might bypass the filter with 
correct responses, and normal messages might be filtered 
with incorrect ones. To model this imperfection, we use ei 
and e2 to represent the ratios of False Positives (FP) and 
False Negatives (FN) in the uncertain region. 

4.2 Traffic Usage Comparison 

We have simulated and analyzed the traffic usage using the 
variable thresholds. Our framework considers three major 
stakeholders (see Fig. [T}: a message sender (A), a message 
center (B), and a receiver (C). 

First, we calculate the traffic used by an existing filter- 
ing method. Only the messages with filtering probabilities 
higher than the threshold h reach C via B; other messages 
are deleted at B (only A to B). Suppose that ys =type for 
type 6 {normal, spam} denotes all messages filtered as type 
in terms of h, the total amount of traffic used is the sum 
of \y£=normal\ x 2 , and \y^ =3pam \ x 1 where | ■ | represents 



the cardinality of a set: N FMe rin g Oni v = \ye=normal\ x 
2 + \y^ =spa m\ x 1- This is because a normal message has 
two paths (A— >B and B— »C) but spam only has one path 
(A— >B). In contrast, our hybrid model divides the mea- 
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surable space into three different areas using two thresh- 
olds: hi and h^. As a result, we have two more parame- 
ters to estimate: the traffic used by normal messages (N U n) 
and spam (N us ) in the uncertain region. Let ys=t yP e for 
type £ {normal, spam} be a set of messages that have label 
type as a ground truth. 

As Fig. |4]shows, there are four possible pathways. Firstly, 
a message classified as normal using the higher threshold is 
sent directly to C via B; the number of paths taken is two 
(A — > B — > C). Secondly, a message is in between the higher 
and the lower thresholds; a correct response is submitted 
by the sender and the message is classified as normal; the 
number of paths taken is four (A^B^A^B^C). 
Thirdly, a message is again in between two thresholds; this 
time no response is returned and the message is classified 
as spam; the number of paths taken is two (A — » B — *• 
A). Lastly, a message classified as spam using the lower 
threshold is deleted at B; the number of paths taken is one 
(A^B). 

The traffic usage is calculated using: 
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(3) 



where ei and e 2 are the probability which normal people 
cannot respond. Here, e\ and e 2 are fixed to values 0.02 and 
0.01 respectively. 

Figures [S]-(a) and (SJ-(b) show the inner sections of the 
graph. The higher threshold is fixed to 0.73333, only the 
lower threshold increases from until it reaches this value. 
The traffic is used less when the lower threshold increases. 
We have also monitored the traffic usage with the lower 
threshold fixed to = 0.1, and with the higher threshold in- 
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Figure 5: Slides of an axis (with fixed threshold) 



creased from this value to 1 (see Fig[5]-a). The traffic us- 
age does not change with the filtering approach because the 
lower threshold is the same as h. As the number of messages 
in the uncertain region increases so does the traffic usage. 

4.3 Variant proportion of spam 

We have fixed the proportion of spam to 14.57% and nor- 
mal messages to 85.32%. In this section we show how the 
performance is affected when these proportions change. 

Table [1] describes a small number of samples from the 
nine different proportions. Each record has six different 
columns: proportion of spam (%), lower threshold (hi), 
higher threshold (/12), traffic usage (TA) of Nhybrid, ratio 
(= Nf ^" g d oniy ) , and accuracy (AGO = T g+™ ) . It uses 
three different measures for the performance. If the traffic 
usage is less, we say the system is lighter and is more eco- 
nomical. The ratio is only close to 1 if the traffic used in 
the hybrid method is close to the amount used in the other. 
The accuracy measures the correctness of message classifi- 
cation. We can select practical threshold values for each 



Table 1: Traffic amounts and accuracy of hybrid 
methods in terms of thresholds 
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spam proportion to compare the performance. For instance, 
threshold values hi =0.1 and I12 — 0.2 can be selected 
in 10% spam proportion to show a reasonable performance 
of the hybrid method. However, if the system is concerned 
with achieving high accuracy and not with reducing the traf- 
fic usage, hi =0.1 and /12 = 0.9 values can be used. In a 
spam-dominant environment (for spam proportion of 50%), 
reasonable threshold values would be hi — 0.4 and /12 = 0.6. 
Returning back to the figures for a spam proportion of 10%, 
hi =0.1 and /12 = 0.9 will be selected when accuracy is the 
most significant factor. 

5. CONCLUSION AND FUTURE WORK 

We proposed a hybrid spam filtering framework for mobile 
communication using a combination of content-based filter- 
ing and challenge-response. A message that falls into the 
uncertain region (after filtering), is further classified by send- 
ing a challenge (e.g. an image CAPTCHA) to the sender: a 
legitimate sender is likely to answer it correctly, whereas an 
automated spam program is not. Challenge- response pro- 
tocols have been designed with the necessary cryptographic 
features. We have also shown the trade-offs between the ac- 
curacy and the traffic usage in using our framework. The 
simulation results suggest that, for a different level of spam 
proportion, the practical thresholds should be carefully se- 
lected according to the required level of the two measures. 

In this paper, a synthetic dataset, as oppose to a real 
dataset has been used due to the following three reasons: 
firstly, we wanted to develop a generalized framework that 
is flexible and applicable to a wide range of applications (e.g 
VoIP spam filters [12]); secondly, it was not easy to find a real 
dataset since the challenge-response protocol is not a com- 
monly used filtering method; lastly, this protocol involves a 
great level of human interaction and developing such a pro- 
totype (in order to generate our own dataset) was outside 



the scope. Our next step will be to contact mobile operators 
and forums like OMTP to collect real data, and evaluate our 
framework against other datasets. 

Having the network operators charge for sending of text 
messages has been one of the big inhibitors to the growth 
of spam: even a cent per message might hugely alter the 
economics of a spammer. Assuming that a reasonable filter- 
ing method is in place, another hybrid potential is to force 
spammers to opt into a charging scheme where the cost of 
responding to a challenge is larger than sending an initial 
spam. For example, imagine it costs two cents to send a 
spam, then it would cost extra five cents to answer an im- 
age CAPTCHA. 
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